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ABSTRACT 

We present an algorithm for sound analysis and resynthesis with 
local automatic adaptation of time-frequency resolution. There ex- 
ists several algorithms allowing to adapt the analysis window de- 
pending on its time or frequency location; in what follows we pro- 
pose a method which select the optimal resolution depending on 
both time and frequency. We consider an approach that we denote 
as analysis-weighting, from the point of view of Gabor frame the- 
ory. We analyze in particular the case of different adaptive time- 
varying resolutions within two complementary frequency bands; 
this is a typical case where perfect signal reconstruction cannot in 
general be achieved with fast algorithms, causing a certain error 
to be minimized. We provide examples of adaptive analyses of a 
music sound, and outline several possibilities that this work opens. 

1. INTRODUCTION 

Traditional analysis methods based on single sets of atomic func- 
tions offer limited possibilities concerning the variation of the res- 
olution. Moreover, the optimal analysis parameters are often set 
depending on an a-priori knowledge of the signal characteristics. 
Analyses with a non-optimal resolution result in a blurring or some- 
times even a loss of information about the original signal, which 
affects every kind of later treatment: visual representation, fea- 
tures extraction and processing among others. This motivates the 
research for adaptive methods, conducted at present in both the 
signal processing and the applied mathematics communities: they 
lead to the possibility of analyses whose resolution locally change 
according to the signal features. 

We present an algorithm with local automatic adaptation of 
time-frequency resolution. In particular, we use nonstationary Ga- 
bor frames [ 1 1 of windows with compact time supports, being able 
to adapt the analysis window depending on its time or frequency 
location. For compactly supported windows fast reconstruction al- 
gorithms are possible, see 1 1 2 3 1: all along the paper we will in- 
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dicate as fast a class of algorithms whose principal computational 
cost is due to the Fourier transform of the signal. 

In the present paper we want to go a step beyond and adapt 
the window in time and frequency. This case has been detailed 
in J4] among others. This can be possible, and frame theory O 
would help in providing perfect reconstruction synthesis methods 
(if no information is lost). However, this is a typical case where the 
calculation of the dual frame for the signal reconstruction cannot 
in general be achieved with a fast algorithm: thus a choice must be 
done between a slow analysis/re-synthesis method guaranteeing 
perfect reconstruction and a fast one giving an approximation with 
a certain error. There are, at least, two interesting approaches to 
obtain fast algorithms: 

• filter bank: the signal is first filtered with an invertible bank 
of P pass band filters, to obtain P different band limited 
signals; for each of these bands a different nonstationary 
Gabor frame {g k t } of windows with compact time support 
is used, with g v k the time-dependent window function. The 
other members of the frame are time-frequency shifts of gf, 

9k,i = 9k(t ~ a k) e k . (!) 

where k, I £ Z and a k , b k are the time location and fre- 
quency step associated to the p-th frame at the time index 
k. We will write NGF to indicate a nonstationary Gabor 
frame in the time case, and we will always assume to be in 
the painless case [6|. Each band-limited signal is perfectly 
reconstructed with an expansion of the analysis coefficients 
in the dual frame {g~k~j p }. Note that by this notation we 
denote the dual frame for a fixed p. By appropriately com- 
bining the reconstructed bands we obtain a perfect recon- 
struction of the original signal. An important remark is that 
the reconstruction at every time location is perfect as long 
as all the frequency coefficients within all the P analyses 
are used. On the other hand, for every analysis we are inter- 
ested in considering only the frequency coefficients corre- 
sponding to the considered band, thus introducing a recon- 
struction error. 
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• analysis - weighting: the signal is first analyzed with P 
NGFs {g k t } of windows with compact time support . Each 
analysis is associated to a certain frequency band, and its 
coefficients are weighted to match this association. We look 
for a reconstruction formula to minimize the reconstruction 
error when expanding the weighted coefficients within the 
union of the P individual dual frames U p= i{g~k^i p }. 

We focus here on the second approach, in the basic case of two 
bands; so we split the frequency dimension into high and low fre- 
quencies, with P = 2. We provide the algorithm for an automatic 
adaptation routine: in each frequency band, the best resolution is 
defined through the optimization of a sparsity measure deduced 
from the class of Renyi entropies |7|. As for the filter bank ap- 
proach, the results detailed in 1 8 1 indicate a useful solution: they 
give an exact upper bound of the reconstruction error when recon- 
structing a compactly supported and essentially band-limited sig- 
nal from a certain subset of its analysis coefficients within a Gabor 
frame. 

In the first section, the analysis-weighting method is treated 
with an extension of the weighted Gabor frames approach (9], 
which will give us a closed reconstruction formula. The second 
section is dedicated to the sparsity measures we use for the au- 
tomatic adaptation, with an insight on how weighting techniques 
of the analysis coefficients can lead to measures with specific fea- 
tures. We then close the paper with some examples and an overview 
on the perspectives of our research. 

2. RECONSTRUCTION FROM WEIGHTED FRAMES 

Let P € N and {£ } be different NGFs, p = 1, . . . , P, where 
k and I are the time and frequency location, respectively. We will 
consider weight functions < w p {u) < oo: for every p, they only 
depend on the frequency location. The idea is to smoothly set to 
zero the coefficients not belonging to the frequency portion which 
the p-th analysis has been assigned to; in this way, every analysis 
will just contribute to the reconstruction of the signal portion of its 
pertinence, so high or low frequencies respectively when P = 2. 
For each NGF {g^ t } we write c p k l = w p {b p k l) (/, g p t ) to indicate 
the weighted analysis coefficients, and we consider the following 
reconstruction formula: 

where p(z/) = Jj{p : w p (is) > e} and for every e > 0, r(p, k, I) is 
0ifw p (b p l) < e,else 

r (p,fc,o = K(&m<*>) ^f)^ • ( 3 ) 

We see that non-zero weights cancel each other: this recon- 
struction formula still makes sense, as the goal is exactly to find a 
reconstruction as an expansion of the c p t . 

We give now an interpretation of the introduced formula. If 
w p is a semi-normalized sequence for each p, that is there exist 
constants m p and n p such that < m p < w p (b P ,l) < n p and 
e < m p Vp, then p(u) = p and the equation ^ becomes 

/ = p- EE K(^)</,<*>) = / ■ < 4 > 



This is related to the concept of weighted frames detailed in |9), as 
in the hypothesis of semi-normalization the sequence w p (b k l)g k z 
is a frame with — , p , gk,i p as one of its dual. For weights which 

^ k ' 

are not bounded from below, but still non-zero, the reconstruction 
still works: the sequences w p (b k l) ■ g k l are not frames anymore 
(for each p), but complete Bessel sequences (also known as upper 
semi-frames 1 10|). This reconstruction can be unstable, though. 

In our case, these hypotheses are not verified, as we need to 
set to zero a certain subset of the coefficients within both of the 
analyses; thus the equation {2]) will in general give an approxima- 
tion of /. In section |4~2"1 we give an example of reconstruction 
following this approach, evaluating the reconstruction error; fur- 
ther theoretical and numerical examinations should be realized, as 
we are interested to find an upper bound for the error depending 
on: 

• the signal spectral features at frequencies v where p(i/) > 

i; 

• the features of the w p sequences and the p(i^) function. 

A first natural choice for the weights w p is a binary mask; 
first because this is the worst case in terms of reconstruction error, 
as we are multiplying in the frequency domain with a rectangu- 
lar window before performing an inverse Fourier transform. Thus 
the analysis of the error with a binary masking establish a bound 
to the error obtained with a smoother mask. Moreover, with a bi- 
nary mask the reconstruction formula takes the very simple form 
detailed in equation d6j, allowing a direct implementation derived 
from the general full band algorithm. So we consider P = 2 and 
u c a certain cut value, then 

if x f 1 if v < u c rs ., 

w (u) = < n . f . (5) 

V ' [0 if V > Uc 

and w 2 (u) = l—w 1 ^). In this case p(u) = 1 for every frequency 
v and the equation Q becomes 

/ = (f'9k,i)gk~i 1 + (f^9k,i)9kj 2 ■ (6) 

The reconstruction error in this case will in general be large at fre- 
quencies corresponding to coefficients close to the cut value u c \ 
we envisage that a way to reduce this error is to allow the w p 
weights to have a smooth overlap; this results in more coefficients 
form different analyses contributing to the reconstruction of a same 
portion of signal, thus weakening their interpretation. 

3. RENYI ENTROPY EVALUATION OF WEIGHTED 
SPECTROGRAMS 

The representation we take into account is the spectrogram of a 
signal /: it is the squared modulus of the Short-Time Fourier 
Transform (STFT) of / with window g, which is defined by 

V 9 f (u, = f f(t)g(t - u^-^dt , (7) 

and so the spectrogram is PS/(£, u) = \V g f(t, u)\ 2 . Given a Ga- 
bor frame {gk,i} we obtain a sampling of the spectrogram coeffi- 
cients considering Zk,i = | (/, gk,i) \ 2 ■ With an appropriate normal- 
ization, both the continuous and sampled spectrogram can be in- 
terpreted as probability densities. The idea to use Renyi entropies 
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as sparsity measures for time-frequency distributions has been in- 
troduced in (7): minimizing the complexity or information of a 
set of time-frequency representations of a same signal is equiva- 
lent to maximizing the concentration, peakiness, and therefore the 
sparsity of the analysis. Thus we will consider as best analysis the 
sparsest one, according to the minimal entropy evaluation. 

Given a signal / and its spectrogram PS/, the Renyi entropy 
of order a > 0, a ^ 1 of PS / is defined as follows 

*£(PS/) = t^- log 2 f [ ( rr ™, At ' U l , Ydtdu 
V /; 1-a * 2 JJ R \ff R PSf(t>,u')dt'dcj'J 

(8) 

where R CM 2 and we omit its indication if equality holds. Given 
a discrete spectrogram obtained through the Gabor frame {gk,i}, 
we consider R as a rectangle of the time-frequency plane R = 
[ti, £2] x [ui, U2] C IR 2 . It identifies a sequence of points G on 
the sampling grid defined by the frame. As a discretization of the 
original continuous spectrogram, every sample |<Zfc,z| 2 is related to 
a time-frequency region of area ab, where a and b are respectively 
the time and frequency steps; we thus obtain the discrete Renyi 
entropy measure directly from (f8), 



H£[PS/] = 



log 2 Yl ( 



Zk,l 



E 



[k',l']eG 



Zk'V 



+ log 2 (a6) . 
(9) 

We consider now another weight function < w(k, I) < 00; 
instead of weighting the STFT coefficients (/, gk,i) as we did in 
Section [2] we weight here the discrete spectrogram obtaining a 
new distribution z% x = w(k, l)zk,i which is not necessarily the 
spectrogram of a signal: nevertheless, by the definition of w(k, I), 
its Renyi entropy can still be evaluated from (|9}. This value gives 
an information of the concentration of the distribution within the 
time-frequency area emphasized by the specific weight function: 
as we show in section |4~T| this can be useful for the customization 
of the adaptation procedure. 

We will focus on discretized spectrograms with a finite num- 
ber of coefficients, as dealing with digital signal processing re- 
quires to work with finite sampled signals and distributions. As 
a tends to one this measure converges to the Shannon entropy, 
which is therefore included in this larger class. General properties 
of Renyi entropies can be found in ifTTI . lfT2l and lfT3l ; in par- 
ticular, given P a probability density, H Q (P) is a non increasing 
function of a, so ai < 0:2 =>• H Ql (P) > H a2 (P) . Moreover, for 
every order a the Renyi entropy H a is maximum when P is uni- 
formly distributed, while it is minimum and equal to zero when P 
has a single non-zero value. As we are working with finite discrete 
densities we can also consider the case a = which is simply 
the logarithm of the number of elements in p; as a consequence 
Ho[p] > H Q [p] for every admissible order a. As long as we can 
give an interpretation to the a parameter, this class of measures of- 
fers a largely more detailed information about the time-frequency 
representation of the signal. 



so that the discretized temporal support of the scaled windows g s 
still remains inside G for any s G S. In our case, G is a rectangle 
with the time segment analyzed as horizontal dimension and the 
whole frequency lattice as vertical: at each step of our algorithm, 
this rectangle is shifted forward in time with a certain overlap with 
the previous position. By fixing an a, the sparsest local analysis is 
defined to be the one with minimum Renyi entropy: thus the opti- 
mization is performed on the scaling factor s, and the best window 
is defined consequently, with a similar approach to the one devel- 
oped in [ 14]. With the weight functions introduced above, we are 
also able to limit the frequency range of the rectangle G at each 
time location: adaptation is thus obtained over the time dimension 
for each weighted spectrogram, so in our case for each frequency 
band enhanced. An interpolation is performed over the overlap- 
ping zones to avoid abrupt discontinuities in the tradeoff of the res- 
olutions: in the examples given in section]?] the spectrogram seg- 
ment for the entropy evaluation includes four spectrogram frames 
of the largest window, and the overlapping zone corresponds to 
three frames of the largest window. The temporal sizes of the seg- 
ment and the overlap are deduced accordingly. 
The time-frequency adapted analysis of the global signal is finally 
realized by opportunely assembling the slices of local sparsest 
analyses obtained with the selected windows. 

3.2. Biasing spectral coefficients through the a parameter 

The a parameter in equation ^ introduces a biasing on the spec- 
tral coefficients; to have a qualitative description of this biasing, 
we first consider a collection of simple spectrograms composed 
by a variable amount of large and small coefficients. We realize a 
vector D of length N = 100 generating numbers between and 
1 with a normal random distribution; then we consider the vectors 
D M , 1 < M < N such that 



D M [k] 



D[k] if k<M 
20 



if k > M 



(11) 



and then normalize to obtain a unitary sum. We then apply Renyi 
entropy measures with a varying between and 3: these are the 
values that we use to adopt for music signals. As we see from fig- 
ure [T] there is a relation between the number of large coefficients 
M and the slope of the entropy curves for the different values of 
a. For a = 0, Ho [D m] is the logarithm of the number of non-zero 
coefficients and it is therefore constant; when a increases, we see 
that densities with a small amount of large coefficients gradually 
decrease their entropy, faster than the almost flat vectors corre- 
sponding to larger values of M. This means that by increasing a 
we emphasize the difference between the entropy values of a peaky 
distribution and that of a nearly flat one. The sparsity measure, we 
consider, selects as best analysis the one with minimal entropy, 
so reducing a rises the probability of less peaky distributions to be 
chosen as sparsest: in principle, this is desirable as weaker compo- 
nents of the signal, such as partials, have to be taken into account 
in the sparsity evaluation. 



3.1. Adaptive procedure 

We choose a finite set S of admissible scaling factors, and realize 
different scaled version of a window g, 



g s (t) = ±g( t - 



(10) 



The second example we consider shows that the just men- 
tioned principle should be applied with care, as a small coefficient 
in a spectrogram could be determined by a partial as well as by 
a noise component; with an extremely small a, the best window 
selected could vary without a reliable relation with spectral con- 
centration, depending on the noise level within the sound. We il- 
lustrate how noise has to be taken in account when tuning the a 
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Figure 1 : Renyi entropy evaluations of the Dm vectors with vary- 
ing a; the distribution becomes flatter as M increases. Therefore 
increasing a favors a sparse representation (see text). 



coefficients, causing the rise of the entropy values for any a. The 
key point is the observation of how they rise, depending on the a 
value: the convexity of the surface in figure [2] increases as a be- 
comes larger, and it describes the impact of the noise level on the 
evaluation; the stronger convexity when a is around 3 denotes an 
higher robustness, as the noise level needs to be high to determine 
a significant entropy variation. Our tests show that, as a draw- 
back, in this way we lower the sensitivity of the evaluation to the 
partials, and the measure keeps almost the same profile for every 

Rpart ^ 1- 

On the other hand, when a tends to the entropy growth is almost 
linear in L, showing the significant impact of noise on the evalua- 
tion, as well as a finer response to the variation of the partials am- 
plitude. As a consequence, the tuning of the a parameter has to be 
performed according to the desired tradeoff between the sensitiv- 
ity of the measure to the weak signal components to be observed, 
and the robustness to noise. In our experimental experience, the 
value of 0.7 is appropriate for both speech and music signals. 



parameter by means of another model of spectrogram: taking the 
same vector D considered previously, and two integers 1 < N part , 
1 < Rpart, we define Dl like follows: 



D L [k] 



D[k\ 

Rpart 



if k = 1 

if 1 < k < Npart 
if k > Npart • 



(12) 



where R r . 



_ Rpart ^ ^ [-j^p 1]; then we normalize to obtain 
a unitary sum. This vectors are a simplified model of the spectro- 
grams of a signal whose coefficients correspond to one main peak, 
Npart partials with amplitude reduced by R par t and some noise 
whose amplitude varies, proportionally to the L parameter, from a 
negligible level to the one of the partials. Applying Renyi entropy 
measures with a varying between and 3, we obtain the figure 
[2] which shows the impact of the noise level L on the evaluations 
with different values of a. 




alpha. 



Figure 2: Renyi entropy evaluations of the Dl vectors with vary- 
ing a, Npart = 5 and R par t = 2; the entropy values rise differ- 
ently as L increases, depending on a: this shows that the impact 
of the noise level on the entropy evaluation depends on the entropy 
order (see text). 

The increment of L corresponds to a strengthening of the noise 



4. ALGORITHMS AND EXAMPLES 

We give here two examples of the methods described above: the 
first shows an application of two different weights on the spec- 
trogram of a given sound, which determines two different choices 
for the optimal resolutions; the second is a reconstruction with the 
algorithm detailed in Section [2] 

4.1. Adaptation with Different Masks 

We can privilege a certain subset of the analysis coefficients to 
drive the adaptation routine, instead of considering them all with 
the same importance. For example, the adaptation within the p-th 
band could be determined from the coefficients laying at a certain 
small distance from the band central frequency. 

Figures [3] and [4] are realized with an improved version of the 
algorithm described in 1 which allows for a weighting of the 
analysis coefficients which concerns only the adaptation routine, 
and not the analysis and re-synthesis. Thus, we obtain different 
adapted analyses depending on the frequency area we wish to priv- 
ilege, still preserving perfect reconstruction: the sound we analyze 
is a music signal with a bass guitar, a drum set and a female singing 
voice starting from second 1 .54. We use two different complemen- 
tary binary masks, the first setting to zero the spectrogram coeffi- 
cients corresponding to frequencies higher than 300Hz, the second 
doing the opposite. As we can see in Figure [3] with the first mask 
we obtain an analysis where the largest window is privileged; this 
is the best frequency resolution for the bass guitar sound, which is 
prominent in the considered band. The only points where shorter 
windows are chosen correspond to strong transients, as bass or 
voice attacks, where the time precision is enhanced. 
With the second mask, low frequencies are ignored in the adapta- 
tion step, and as a consequence we obtain a different optimal anal- 
ysis: the smallest window is generally selected, yielding an higher 
time resolution which is best adapted to the percussive sounds; 
moreover, we see that the largest window is chosen correspond- 
ing to the presence of the singing voice, whose higher harmonics 
belong to the considered band and determine a better frequency 
resolution to be privileged. 
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Figure 3: Adaptive analysis with a mask privileging frequencies 
below 300Hz, on a music signal with a bass guitar, a drum set 
and a female singing voice starting from second 1.54: on top, best 
window size chosen as a function of time; at the bottom, adapted 
spectrogram of the analyzed sound file. 




Figure 4: Adaptive analysis with a mask privileging frequencies 
above 300Hz, on a music signal with a bass guitar, a drum set 
and a female singing voice starting from second 1 .54: on top, best 
window size chosen as a function of time; at the bottom, adapted 
spectrogram of the analyzed sound file. 



In both cases we calculate the difference between the signal re- 
constructed and the original one; we use a 16 bit audio file, whose 
amplitude is represented in the range [—1, 1] with double preci- 
sion: the maximum absolute value of the differences between cor- 
responding time samples, as well as the root mean square error 
over the entire signal, are both of order 10 -16 . 

4.2. Analysis-Weighting Example 

We show here an example of the approximation of a signal apply- 
ing the formula ([6j, within the analysis-weighting approach using 
a binary mask: as detailed in Sections [2] and [3] we analyze a sig- 
nal with different stationary Gabor frames; the sound we consider 
is the same of the section |4.1| and the binary mask is still ob- 
tained with a cut frequency of 300Hz, while the sampling rate is 
44.1kHz. We modify the coefficients of all these analyses with the 
mask w 1 (u), and build the NGF {g\ t } with resolutions adapted to 
the low frequencies optimizing the entropy of the masked analyses. 
Then we repeat this step with the mask w 2 (u) and build the NGF 
{9k,i}- We finally calculate the duals of the two NGFs, which can 
be done in these cases with fast algorithms, and re-synthesize the 
two signal bands: for these examples, the reconstruction is per- 
formed with the SuperVP phase vocoder by Axel Robel 1161 . 
Figure [5] shows the spectrogram of the lower signal band, recon- 
structed with the low-frequencies adapted analysis. This spectro- 
gram is computed with a fixed window, which is the largest one 
within the set considered; the choice of the best window is given 
as well, to give information about how the reconstruction is per- 
formed at each time. Figure [6] is obtained in the same way, con- 
sidering the upper band reconstruction. The approximation of the 
original sound is then given by the sum of the two bands. 

The reconstruction error we obtain is higher than the one in the 
previous examples: the maximum absolute value of the samples 
differences is 0.0568, while the root mean square error is 0.0099. 
With the choice of a binary mask, the only way to reduce the error 
is to set the cut frequency in a range where the signal energy is 



low: unfortunately, music signals generally do not have large low- 
energy bands; moreover, the interest of our method relies in the 
possibility for the cut frequency to be variable, in order to freely 
select the adaptation criterium. 

Figure|7]shows the spectrogram of the difference between the orig- 
inal sound and the reconstructed one, and we see that the spectral 
content of the error is concentrated at the cut frequency. The al- 
teration introduced has negligible perceptual effects, so that the 
original signal and the reconstruction are hard to be distinguished: 
this aspect needs to be quantified; when dealing with the approxi- 
mation of music signals, the objective error measures do not give 
any information about the perceptual meaning of the error. The ac- 
curacy of a method has thus to be evaluated by means of measures 
taking into account the human auditory system as well as listening 
tests. 

Another element to consider is the overlap between the weight 
functions introduced in section [2] if we allow them for an over- 
lap over a sufficiently large frequency band, we envisage that the 
error would be reduced. The sense of this point can be clari- 
fied considering the causes of the reconstruction error: windows 
with compact time support cannot have a compactly supported 
Fourier transform; from the analysis point of view, this means that 
a spectrogram coefficient affects the signal reconstruction among 
the whole frequency dimension. We can limit such an influence 
with a choice of well-localized time-frequency atoms: even if their 
frequency support is not compact, they have a fast decay outside a 
certain region. If we cut with a binary mask outside a certain band, 
the reconstruction error comes mainly from the fact that we are 
setting to zero the contribution of atoms whose Fourier transforms 
spread into the band of interest: if the atoms are well-localized, 
only a few of them actually have an impact. 
Formula ([2J gives an ideal reference: if the overlap is the entire 
frequency dimension, weights are non-zero, thus we have a per- 
fect reconstruction from the weighted coefficients. When some 
weights are zero and weight functions do overlap, the normaliza- 
tion factor in the formula <[2j is greater than one in the overlapping 
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Figure 5: Low-frequencies reconstruction from the masked 
adapted analysis of a music signal with a bass guitar, a drum set 
and a female singing voice starting from second 1 .54: on top, best 
window size chosen as a function of time. At the bottom, spectro- 
gram of the analyzed band with a 4096 samples Hamming window, 
3072 samples overlap and 4096 frequency points; the frequency 
axis is bounded to 2kHz to focus on the reconstructed region. 



frequency interval. This reduces the impact of the errors coming 
from individual re- syntheses: on the other hand, the fact of sum- 
ming them all imposes a limit to the achievable global error reduc- 
tion. 

A further improvement of this formula is to put different weights at 
the denominator in |4]), with an effective amplification or reduction 
of the contributions coming from individual coefficients. To keep 
the perfect reconstruction valid in the case of semi-normalized 
norms, a possibility is to obtain the different weights as a func- 
tion of the analysis weights depending also on the overlap. 




0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 




time 

Figure 6: High-frequencies reconstruction from the masked 
adapted analysis of a music signal with a bass guitar, a drum set 
and a female singing voice starting from second 1 .54: on top, best 
window size chosen as a function of time; at the bottom, spectro- 
gram of the analyzed band with a 4096 samples Hamming window, 
3072 samples overlap and 4096 frequency points. 



5. CONCLUSIONS AND PERSPECTIVES 

We have sketched the first steps of a promising research project 
about the local automatic adaptation of time-frequency sound rep- 
resentations: a first question which arises is how to display a rep- 
resentation of the signal such the one described; there are two pos- 
sibilities involving weighted means of the coefficients at a certain 
time-frequency location: 

• d k ,i = E 1 wP ■ ,z> displaying \d k ,i\, or 

v v 

l{A) _ 1 L-y I p I 2 

• a k,i — £ p ™p ■ WL \ c k,i\ ■ 

In a previously proposed method [ 15] the algorithm keeps the orig- 
inal coefficients in memory; with this approach, we can use the 
reconstruction scheme mentioned in ( (13) . A further new ques- 
tion would be how to reconstruct the signal from an expansion of 
the dk,i or d k ^ coefficients. Straightforward numerical examples 
could give some numerical insights. 

If dfri is used, we also have to address the problem of the 
phase. This approach is useful when dealing with spectrogram 
transformations where the phase information is lost, as with reas- 
signed spectrogram or spectral cepstrum. We could either use an 
iterative approach, like the one described in [ 17] adapted to frame 



x 10 4 




time 

Figure 7: Spectrogram of the reconstruction error given by the 
described method on a music signal with a bass guitar, a drum set 
and a female singing voice starting from second 1 .54; spectrogram 
obtained with a 4096 samples Hamming window, 3072 samples 
overlap and 4096 frequency points. 
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theory, or use a system with a high redundancy (see [ 18 |). 

From a computational point of view, we are interested in lim- 
iting the size of the signal for the direct and inverse Fourier trans- 
forms in (pi, as this will largely improve the efficiency of the algo- 
rithm. A different form of the formula ^ in this sense is 

/\£<^(^(J§))) <>3) 

whose properties have to be further investigated. 

Later we would also investigate the properties of time- variant 
filters by multiplying these new sets of coefficients, resulting in 
new kinds of frame multipliers |19|. Using an optimized way to 
analyze acoustical signal, will, therefore, also lead to a better con- 
trol of such adaptive filters. 
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